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Abstract 

We consider approximating data structures with collections of the 
items that they contain. For examples, lists, binary trees, tuples, etc, 
can be approximated by sets or multisets of the items within them. 
Such approximations can be used to provide partial correctness 
properties of logic programs. For example, one might wish to 
specify than whenever the atom sort{t, s) is proved then the two 
lists t and s contain the same multiset of items (that is, s is a 
permutation of t). If sorting removes duplicates, then one would 
like to infer that the sets of items underlying t and a are the same. 
Such results could be useful to have if they can be determined 
statically and automatically. We present a scheme by which such 
collection analysis can be structured and automated. Central to 
this scheme is the use of linear logic as a computational logic 
underlying the logic of Horn clauses. 

Categories and Subject Descriptors F.4.1 [Mathematical Logic]: 
Computational logic; 1.2.3 [Deduction and Theorem Proving]: 
Logic programming 

General Terms Design, Theory, Verification 

Keywords proof search, static analysis, Horn clauses, linear logic 

1. Introduction 

Static analysis of logic programs can provide useful information for 
programmers and compilers. Typing systems, such as in AProlog 
1231 1241, have proved valuable during the development of code: 
type errors often represent program errors that are caught at com- 
pile time when they are easier to find and fix than at runtime when 
they are much harder to repair. Static type information also pro- 
vides valuable documentation of code since it provides a concise 
approximation to what the code does. 

In this paper we describe a method by which it is possible to 
infer that certain relationships concerning collections underlying 
structured data hold. We shall focus on relations that are also de- 
cidable and can be done during compile time analysis of logic pro- 
grams. We shall use multisets and sets to approximate more com- 
plicated structures as lists and binary trees. Consider, for example, 
a list sorting program that maintains duplicates of elements. Part 
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of the correctness of a sort program includes the fact that if the 
atomic formula sort(t, s) is provable, then s is a permutation of 
t that is in-order. The proof of such a property is likely to involve 
inductive arguments requiring the invention of invariants: in other 
words, this is not likely to be a property that can be inferred stati- 
cally during compile time. On the other hand, if the lists t and s are 
approximated by multisets (that is, if we forget the order of items 
in lists), then it might be possible to establish that if the atomic 
formula sort{t, s) is provable, then the multiset associated to s is 
equal to the multiset associated to t. If that is so, then it is imme- 
diate that the lists t and s are, in fact, permutations of one another 
(in other words, no elements were dropped, duplicated, or created 
during sorting). As we shall see, such properties based on using 
multisets to approximate lists can often be done statically. 

This paper considers exclusively the static analysis of first-order 
Horn clauses but it does so by making substitution instances of 
such Horn clauses that carry them into linear logic. Proofs for the 
resulting linear logic formulas are then attempted as part of static 
analysis. 

2. The undercurrents 

There are various themes that underlie our approach to inferring 
properties of Horn clause programs. We list them explicitly below. 
The rest of the paper can be seen as a particular example of how 
these themes can be developed. 

2.1 If typing is important, why use only one? 

Types and other static properties of programming languages have 
proved important on a number of levels. Typing can be useful for 
programmers: they can offer important invariants and document for 
code. Static analysis can also be used by compilers to uncover use- 
ful structures that allow compilers to make choices that can improve 
execution. While compilers might make use of multiple static anal- 
ysis regimes, programmers do not usually have convenient access 
to multiple static analyzes for the code that they are composing. 
Sometimes, a programming language provides no static analysis, 
as is the case with Lisp and Prolog. Other programming languages 
offer exactly one typing discipline, such as the polymorphic typ- 
ing disciplines of Standard ML and AProlog (SML also statically 
determines if a given function defined over concrete data struc- 
tures cover all possible input values). It seems clear, however, that 
such analysis of code, if it can be done quickly and incrementally, 
might have significant benefits for programmers during the process 
of writing code. For example, a programmer might find it valuable 
to know that a recursive program that she has just written has linear 
or quadratic runtime complexity, or that a relation she just specified 
actually defines a function. The Ciao system preprocessor 1 14] pro- 
vides for such functionality by allowing a programmer to write var- 
ious properties about code that the preprocessor attempts to verify. 



Having an open set of properties and analysis tools is an interesting 
direction for the design of a programming language. The collection 
analysis we discuss here could be just one such analysis tool. 

2.2 Logic programs as untyped A-expressions 

If we do not commit to just one typing discipline, then it seems 
sensible to use a completely untyped setting for encoding programs 
and declarations. Given that untyped A-terms provide for arbitrary 
applications and arbitrary abstractions, such terms can provide an 
appealing setting for the encoding of program expressions, type ex- 
pressions, assertions, invariants, etc. Via the well developed theory 
of A-conversion, such abstractions can be instantiated with a vari- 
ety of other objects. Abstractions can be used to encode quantifiers 
within formulas as well as binding declarations surrounding entire 
programs. 

In logic programming, proofs can be viewed as computation 
traces and such proof objects can also be encoded as untyped A- 
terms. Instantiations into proofs is also well understood since it is 
closely related to the elimination of cut in sequent calculus or to 
normalization in natural deduction proofs. The fact that proofs and 
programs can be related simply in a setting where substitution into 
both has well understood properties is certainly one of the strengths 
of the proof theoretic foundations of logic programming (see, for 
example, 1221 ). 

2.3 What good are atomic formulas? 

In proof theory, there is interesting problem of duality involving 
atomic formulas. The initial rule and the cut rule given as 



Initial 



ri.-c,Ai r2,Ci 



Cut 



C'-C ri,r2»-Ai,A2 

can be seen as being dual to each other 1131 . In particular, the 
initial rule states that an occurrence of a formula on the left is 
stronger than the same occurrence on the right, whereas the cut 
rule states the dual: an occurrence of a formula on the right is 
strong enough to remove the same occurrence from the left. In most 
well designed proof systems, all occurrence of the cut-rule can be 
eliminated (whether or not C is an atomic formula) whereas only 
non-atomic initial rules (where C is non-atomic) can be eliminated. 
Atoms seem to spoil the elegant duality of the meta-theory of these 
inference rules. 

While the logic programming world is most comfortable with 
the existence of atomic formulas, there have been a couple of recent 
proof theoretic approaches that try to eliminate them entirely. For 
example, in the work on definitions and fixed points by Schroeder- 
Heister |261, Girard |T1 1, and McDowell & Miller 1 17], atoms are 
defined to be other formulas. In this approach, the only primitive 
judgment involving terms is that of equality. In that setting, if def- 
initions are stratified (no recursion through negations) and noethe- 
rian (no infinite descent in recursion), then all instances of cut and 
initial can be removed. The setting of Indies of Girard 1 12 1 is a more 
radical presentation of logic in which atomic formulas do not exist: 
formulas can be probed to arbitrary depth to uncover "subformu- 
las". 

Another approach to atoms is to consider all constants as being 
variables. On one hand this is a trivial position: if there are no con- 
stants (thus, no predicate constants) there are no atomic formulas 
(which are defined as formulas with non-logical constants at their 
head). On the other hand, adopting a point-of-view that constants 
can vary has some appeal. We describe this next. 

2.4 Viewing constants and variables as one 

The inference rule of V-generalization states that if B is provable 
then Wx.B is provable (with appropriate provisos if the proof of B 
depends on hypotheses). If we are in a first-order logic, then the 



free first-order variable a; of _B becomes bound in Wx.B by this 
inference rule. 

Observe the following two things about this rule. First, if we 
are in an untyped setting, then we can, in principle, quantify over 
any variable in any expression, even those that play the role of 
predicates or functions. Mixing such rich abstractions with logic 
is well known to be inconsistent so when we propose such rich 
abstractions in logic, we must accompany it with some discipline 
(such as typing) that will yield consistency. 

Second, we need to observe that differences between constants 
and variables can be seen as one of "scope", at least from a syntac- 
tic, proof theoretic, and computational point of view. For example, 
variables are intended as syntactic objects that can "vary". During 
the computation of, say, the relation of appending lists, universal 
quantified variables surrounding Horn clauses change via substitu- 
tion (via backchaining and unification) but the constructors for the 
empty and non-empty lists as well as the symbol denoting the ap- 
pend relation do not change and, hence, can be seen as constants. 
But from a compiling and linking point-of-view, the append pred- 
icate might be considered something that varies: if append is in a 
module of Prolog that is separately compiled, the append symbol 
might denote a particular object in the compiled code that is later 
changed when the code is loaded and linked. In a similar fashion, 
we shall allow ourselves to instantiate constants with expression 
during static analysis. 

Substituting for constants allows us to "split the atom": that is, 
by substituting for the predicate p in the atom p{ti, . . . ,tn), we 
replace that atom with a formula, which, in this paper, will be a 
linear logic formula. 

2.5 Linear logic underlies computational logic 

Linear logic 1 10| is able to explain the proof theory of usual Horn 
clause logic programming (and even richer logic programming 
languages |15]). It is also able to provide means to reason about 
resources, such as items in multisets and sets. Thus, linear logic 
will allow us to sit within one declarative framework to describe 
both usual logic programming as well as "sub-atomic" reasoning 
about the resources implicit in the arguments of predicates. 



3. A primer for linear logic 

Linear logic connectives can be divided into the following groups: 
the multiplicatives ^, _L, ®, 1; the additives ®, 0, &, T; the 
exponentials !, ?; the implications -o (where B -o C is defined 
as B^ ^ C) and ^ (where B C is defined as (! B)-^ ^ C); 
and the quantifiers V and 3 (higher-order quantification is allowed). 
The equivalence of formulas in linear logic, B o-o C, is defined as 
the formula (B C) & (C -o B). 

First-order Horn clauses can be described as formulas of the 
form 



Va:i 



.\^x^[Ax^...^A„^ Ao] {n,m>0) 



where A and D are intuitionistic or classical logic conjunction 
and implication. There are at least two natural mappings of Horn 
clauses into linear logic. The "multiplicative" mapping uses the 8 
and -o for the conjunction and implication: this encoding is used 
in, say, the linear logic programming settings, such as Lolli 1 15], 
where Horn clause programming can interact with the surrounding 
linear aspects of the full programming language. Here, we are not 
interested in linear logic programming per se but with using linear 
logic to help establish invariants about Horn clauses when these 
are interpreted in the usual, classical setting. As a result, we shall 
encode Horn clauses into linear logic using the conjunction & and 
implication =>: that is, we take Horn clauses to be formulas of the 



form 



and of the sequent 



Mx-i . . . "ixm [yli & . . . & yl„ => Ao] . (n, m > 0) 

The usual proof search behavior of first-order Horn clauses in 
classical (and intuitionistic) logic is captured precisely when this 
style of linear logic encoding is used. 

4. A primer for proof theory 

A sequent is a triple of the form S: F A were E, the signature, 
is a list of non-logical constants and eigenvariables paired with a 
simple type, and where both F and A are multisets of E-formulas 
(i.e., formulas all of whose non-logical symbols are in E). The rules 
for linear logic are the standard ones [10|, except here signatures 
have been added to sequents. The rules for quantifier introduction 
are the only rules that require the signature and they are reproduced 
here: 

T.,y:r-B[y/x\,V ^ E h t: r E; F ■- g[t/a:], A 

E;3a;^.B,F i- A E; F 3a;^.B, A 

EhtiT T.-B[t/x\,V ^ A Y.,y:T-V^B[y/x\,A 
E;Va;^.B,F A E; F .- Va;^.B, A 

The premise E h r is the judgment that the term t has the 
(simple) type r given the typing declaration contained in E. 

We now outline three ways to instantiate things within the 
sequent calculus. 

4.1 Substituting for types 

Although we think of formulas and proofs as untyped expressions, 
we shall use simple typing within sequents to control the kind of 
formulas that are present. A signature is used to bind and declare 
typing for (eigen)variables and non-logical constants within a se- 
quent. Simple types are, formally speaking, also a simple class of 
untyped A-terms: the type o is used to denote formulas (following 
Church 1 7 1). In a sequent calculus proof, simple type expressions 
are global and admit no bindings. As a result, it is an easy matter to 
show that if one takes a proof with a type constant o and replaces 
everywhere o with some type, say, r, one gets another valid proof. 
We shall do this later when we replace a list by a multiset that ap- 
proximates it: since we are using linear logic, we shall use formulas 
to encode multisets and so we shall replace the type constant list 
with o. 

4.2 Substituting for non-logical constants 

Consider the sequent 

T.,p:r-\ Di, \ D2..\T p{tx, . . . ,t^) 

where the type r is a predicate type (that is, it is of the form 
o) and where p appears in, say, Di and D2 
and in no formula of F. The linear logic exponential ! is used here 
to encode the fact that the formulas Di and D2 are available for 
arbitrary reuse within a proof (the usual case for program clauses). 
Using the right introduction rules for implication and the universal 
quantifier, it follows that the sequent 

E;!F^-Vp[Di ^ D2^p{h,...,tm)] 

is also provable. Since this is a universal quantifier, there must be 
proofs for all instances of this quantifier Let 6 be the substitution 
[p H-^ Xxi . . . \xm-S], where 5* is a term over the signature E U 
{xi, . . . , Xm} of type o. A consequence of the proof theory of 
linear logic is that there is a proof also of 

E;\r DiO^ 026 ^ S[ti/xi,. . . , t^/x^] 



E; ! L»i6), ! Daf, ! F ^ S[U/xi,. . .,t^/xm]. 

As this example illustrates, it is possible to instantiate a predicate 
(here p) with an abstraction of a formula (here, Aa;i . . . \Xm- S). 
Such instantiation carries a provable sequent to a provable sequent. 

4.3 Substituting for assumptions 

An instance of the cut-rule (mentioned earlier) is the following: 

E;Fi.-g E;g,F2--C 
E;Fi,F2 .-C 

This inference rule (especially when associated with the cut- 
elimination procedure) provides a way to merge (substitution) the 
proof of a formula (here, B) with a use of that formula as an as- 
sumption. For example, consider the following situation. Given the 
example in the Section |42l assume that we can prove 

E;!Fi-!Di6' and \ F t~ \ 020 . 

Using two instances of the cut rule and the proofs of these sequent, 
it is possible to obtain a proof of the sequent 

E;!F.-S'[ti/a;i,...,t„/a;„] 

(contraction on the left for !'ed formulas must be applied). 

Thus, by a series of instantiations of proofs, it is possible to 
move from a proof of, say, 

E,p:r;!Di,!_D2,!F 

to a proof of 

E; ! F •- S[tl/Xl, . . . ,tm/Xm]- 

We shall see this style of reasoning about proofs several times be- 
low. This allows us to "split an atom" p{ti, . . . ,tm) into a for- 
mula S[ti/xi, . . . ,tm/xm] and to transform proofs of the atom 
into proofs of that formula. In what follows, the formula S will be 
a linear logic formula that provides an encoding of some judgment 
about the data structures encoded in the terms ti, . . . ,tm- 

A few simple examples of using higher-order instantiations of 
logic programs in order to help reasoning about them appear in 

5. Encoding multisets as formulas 

We wish to encode multisets and sets and simple judgments about 
them (such as inclusion and equality) as linear logic formulas. We 
consider multisets first. Let token item be a linear logic predicate 
of one argument: the linear logic atomic formula item x will denote 
the multiset containing just the one element x occurring once. 
There are two natural encoding of multisets into formulas using this 
predicate. The conjunctive encoding uses 1 for the empty multiset 
and (JD to combine two multisets. For example, the multiset {1, 2, 2} 
is encoded by the linear logic formula item 1 ® item 2 (g) item 2. 
Proofs search using this style encoding places multiset on the left of 
the sequent arrow. This approach is favored when an intuitionistic 
subset of linear logic is used, such as in Lolli |15 |, LinearLF 
l6l . and MSR (Sj. The dual encoding, the disjunctive encoding, 
uses _L for the empty multiset and ^ to combine two multisets. 
Proofs search using this style encoding places multisets on the 
right of the sequent arrow. Multiple conclusion sequents are now 
required. Systems such as LO |2| and Forum |19| use this style 
of encoding. If negation is available, then the choice of which 
encoding one chooses is mostly a matter of style. We pick the 
disjunctive encoding for the rather shallow reason that the inclusion 
judgment for multisets and sets is encoded as an implication instead 
of a reverse implication, as we shall now see. 



VK. (append nil K K) 
VX.VL.VK.VM. (append L K M) ^ (append (cons X L) K (cons X M)) 
VX. (split X nil nil nil) 
VX.VA.VB.VR.VS.(leq A X)&(split X R S B) ^ (split X (cons A R) (cons A S) B) 
VX.VA.VB.VR.VS.(gr A X)&(split X R S B) ^ (split X (cons A R) S (cons A B)) 

(sort nil nil) 

VF.VR.VS.VSm.VB.VSS.VBS. (split F R Sm B)&(sort Sm SS)&(sort B BS)&(append SS (cons F BS) S) (sort (cons F R) S) 



Figure 1. Some Horn clauses for specifying a sorting relation. 

VK.(± ^ K K) 
VX.VL.VK.VM. (L ^ K M) ^ {item X ^ L ^ K o^d item X ^ M) 
VX.(_L 2? _L _L) 
VX.VA.VB.VR.VS.(S ^ B R) ^ 1 ^ [item A ^ S ^ B item A ^ R) 
VX.VA.VB.VR.VS.(S ^Bo-oR)^l^(S^ item A ^ B item A ^ R) 

(±c«_L) 

VF.VR.VS.VSm.VBg.VSS.VBS.(Sm ^ B o^d R)&:(Sm c^^ SS)&(B o-o BS)&(SS ^ item F ^ BS c^^ S) => [item F R o^d S) 



Figure 2. The result of instantiating various non-logical constants in the above Horn clauses. 



Let S and T be the two formulas item s\ ^ ■ ■ ■ ^ item s„ and 
item ti ^ • • • ^ item tm, respectively (n, m > 0). Notice that 
h S -o T if and only if h T ^ S if and only if the two multisets 
{si, . . . , Sn} and {ti, . . . , tm} are equal. Consider now, however, 
the following two ways for encoding the multiset inclusion S C T. 

• S -o T. This formula mixes multiplicative connectives 
with the additive connective 0: the latter allows items that are 
not matched between S and T to be deleted. 

• 35(5 ^ q ~o T). This formula mixes multiplicative connec- 
tives with a higher-order quantifier. While we can consider the 
instantiation for q to be the multiset difference of S from T, 
there is no easy way in the logic to enforce that interpretation 
of the quantifier. 

As it turns out, these two approaches are equivalent in linear logic: 
in particular, h 0-0 Vp.p (linear logic absurdity) and 

h VSVr[(5' ^ ^ T) 3q{S ^q^T)]. 

Thus, below we can choose either one of these encodings for 
multiset inclusion. 

6. Multisets approximations 

A multiset expression is a formula in linear logic built from the 
predicate symbol item (denoting the singleton multiset), the linear 
logic multiplicative disjunction ^ (for multiset union), and the unit 
_L for ^ (used to denote the empty multiset). We shall also allow 
a predicate variable (a variable of type o) to be used to denote 
a (necessarily open) multiset expression. An example of an open 
multiset expression is item f{X) ^ ± ^ V, where F is a variable 
of type o, X is a first-order variable, and / is some first-order term 
constructor. 

Let S and T be two multiset expressions. The two multiset 
judgments that we wish to capture are multiset inclusion, written as 
5 C T, and equality, written as S = T. We shall use the syntactic 
variable p to range over these two judgments, which are formally 
binary relations of type o —> o —> o. A multiset statement is a 
formula of the form 

VxfSi pi Ti &i ■ ■ ■ &c S„ p„ T„ ^ So po To] 

where the quantified variables x are either first-order or of type 
o and formulas So,To, . . . , Sn,Tn are possibly open multiset ex- 
pressions. 



If S and T are closed multiset expressions, then we write \=,n 
5 C T whenever the multiset (of closed first-order terms) denoted 
by 5 is contained in the multiset denoted by T, and we write 
\=m S ^ T whenever the multisets denoted by S and T are equal. 
Similarly, we write 

|=m yx[Si pi Ti & ■ ■ • & Sn p„ Tn So po To] 

if for all closed substitutions 9 such that \=m Si6 pi Ti9 for all 
i = 1, . . . , n, it is the case that \=,n So9 po T06. 

The following Proposition is central to our use of linear logic to 
establish multiset statements for Horn clause programs. 

Proposition \ . Let So,To, . . . ,S„,Tn (n > 0) be multiset ex- 
pressions all of whose free variables are in the list of variables x. 
For each judgment a ptwe write s pt to denote 3q{s ^ q -ot) if 
p is C and t 0-0 s if p is =. If 

Vx[Si Ti & . . . & 5,1 p„ Tn So Po To] 

is provable in linear logic, then 

\=m.s yx[Sl Pi Tl &i ■ ■ ■ &Z Sn Pn Tn ^ So pO To] 

This Proposition shows that linear logic can be used in a sound 
way to infer valid multiset statement. On the other hand, the con- 
verse (completeness) does not hold: the statement 

\/x'^y.{x ^ y) &i {y ^ x) ^ (x = y) 

is valid but its translation into linear logic is not provable. 

To illustrate how deduction in linear logic can be used to es- 
tablish the validity of a multiset statement, consider the first-order 
Horn clause program in Figure [T] The signature for this collection 
of clauses can be given as follows: 



nil 


list 




cons 


int 


-> list -> list 


append 


list 


-> list -> list -> 


split 


int 


-> list -> list -> list 


sort 


list 


-> list -> 


leq 


int 


-> int -> 




int 


-> int -> 



The first two declarations provide constructors for empty and non- 
empty lists, the next three are predicates whose Horn clause defi- 
nition is presented in Figure [T] and the last two are order relations 
that are apparently defined elsewhere. 



VX. (split X nil nil nil) 
VX.VB.VR. VS. (split X R S B) => (split X (cons X R) S B) 
VX.VA.VB.VR.VS.(lt A X)&(split X R S B) ^ (split X (cons A R) (cons A S) B) 
VX.VA.VB.VR.VS.(gr A X)&(split X R S B) ^ (split X (cons A R) S (cons A B)) 



Figure 3. A change in the specification of sphtting hsts to drop duplicates. 

VX.(? l{item X 0)) 
VX.VB.VR.VS.(? R -o ?{item X S © B)) =^ {7{item X © R) ?{item X © S B)) 
VX.VA.VB.VR.VS.1&(? R -o 7{item X S © B)) {?{item A © R) ^ 7{item X © item A © S © B)) 
VX.VA.VB.VR.VS.1&(?R ^ 7{item X S © B)) ^ {7{item A © R) -o 7{item X © S © item A © B)) 



Figure 4. The result of substituting set approximations into the split program. 



If we think of lists as collections of items, then we might want 
to check that the sort program as written does not drop, duplicate, 
or create any elements. That is, if the atom (sort s t) is provable 
then the multiset of items in the list denoted by s is equal to the 
multiset of items in the list denoted by t. If this property holds then 
t and s are lists that are permutations of each other: of course, this 
does not say that it is the correct permutation but this more simple 
fact is one that, as we show, can be inferred automatically. 

Computing this property of our example logic programming 
follows the following three steps. 

First, we provide an approximation of lists as being, in fact, 
multiset: more precisely, formulas denoting multisets. The first 
step, therefore, must be to substitute o for list in the signature 
above. Now we can now interpret the constructors for lists using 
the substitution 

nil ^ cons XxXy. item x ^ y. 

Under such a mapping, the list (cons 1 (cons 3 (cons 2 nil))) is 
mapped to the multiset expression item 1 ^ item 3 ^ item 2 ^ _L. 

Second, we associate with each predicate in Figure[T]a multiset 
judgment that encodes an invariant concerning the multisets de- 
noted by the predicate's arguments. For example, if (append r st) 
or (split ut r s) is provable then the multiset union of the items 
in r with those in s is equal to the multiset of items in t, and if 
(sort s t) is provable then the multisets of items in lists s and t 
are equal. This association of multiset judgments to atomic formu- 
las can be achieved formally using the following substitutions for 
constants: 

append \x\y\z. {x ^ y) o-o z 
split XuXxXyXz. (y ^ z) o-o x 
sort XxXy. x o-o y 

The predicates leq and gr (for the least-than-or-equal-to and 
greater-than relations) make no statement about collections of 
items, so that they can be mapped to a trivial tautology via the 
substitution 

leq 1-^ XxXy. 1 gr i-^ XxXy. 1 

Figure|2]presents the result of applying these mappings to Figure[T] 
Third, we must now attempt to prove each of the resulting 
formulas. In the case of Figure (2] all the displayed formulas are 
trivial theorems of linear logic. 

Having taken these three steps, we now claim that we have 
proved the intended collection judgments associate to each of the 
logic programming predicates above: in particular, we have now 
shown that our particular sort program computes a permutation. 



7. Formalizing the method 

The formal correctness of this three stage approach is easily justi- 
fied given the substitution properties we presented in Section|4]for 
the sequent calculus presentation of linear logic. 

Let F denote a set of formulas that contains those in Figure [T] 
Let 9 denote the substitution described above for the type list, for 
the constructors nil and cons, and for the predicates in Figure[T] 
If S is the signature for F then split E into the two signatures Ei 
and E2 so that Ei is the domain of the substitution 6 and let E3 
be the signature of the range of 9 (in this case, it just contains the 
constant item). Thus, F^ is the set of formula in Figure[2] 

Assume now that Ei , E2 ; F sort{t, s) is provable. Given the 
discussion in Sections [4.11 and l4~2l we know that 

Ei,E3;F6l •-t9cyose 

is provable. Since the formulas in FO are provable, we can use 
substitution into proofs (Section l4.3b to conclude that Ei, E3; ■- 
t9 0-0 s9. Given Proposition[T] we can conclude that \=m t9 = s9: 
that is, that t9 and s9 encode the same multiset. 

Consider the following model theoretic argument for establish- 
ing similar properties of Horn clauses. Let M be the Herbrand 
model that captures the invariants that we have in mind. In par- 
ticular, Ai contains the atoms (append r s t) and (split ut r s) 
if the items in the list r added to the items in list s are the same 
as the items in t. Furthermore, M contains all closed atoms of the 
form (leq t s) and (gr t s), and closed atoms (sort s t) where s 
and t are lists that are permutations of one another One can now 
show that M satisfies all the Horn clauses in Figure [B As a con- 
sequence of the soundness of first-order classical logic, any atom 
provable from the clauses in Figure[T] must be true in ^A. By con- 
struction of M, this means that the desired invariant holds for all 
atoms proved from the program. 

The approach suggested here using linear logic and deduction 
remains syntactic and proof theoretic: in particular, showing that 
a model satisfies a Horn clause is replaced by a deduction within 
linear logic. 

8. Sets approximations 

It is rather easy to encode sets and the equality and subset judg- 
ments on sets into linear logic. In fact, the transition to set from 
multiset is provided by the use of the linear logic exponential: since 
we are using disjunctive encoding of collections (see the discussion 
in Section[5]l, we use the ? exponential (if we were using the con- 
junctive encoding, we would use the ! exponential). 

The expression ? item t can be seen as describing the presence 
of an item for which the exact multiplicity does not matter: this 
formula represents the capacity to be used any number of times. 
Thus, the set {x\ , . . . , n„} can be encoded as ? item x\ ■ ■ ■ 
? item Xn. Using logical equivalences of linear logic, this formula is 



also equivalent to the formula ? (item a;i©- ■ ■© /tern a;„). This latter 
encoding is the one that we shall use for building our encoding of 
sets. 

A set expression is a formula in linear logic built from the 
predicate symbol item (denoting the the singleton set), the linear 
logic additive disjunction © (for set union), and the unit for ffi 
(used to denote the empty set). We shall also allow a predicate 
variable (a variable of type o) to be used to denote a (necessarily 
open) set expression. An example of an open multiset expression 
is item f{X) © © F, where y is a variable of type o, X is a 
first-order variable, and / is some first-order term constructor. 

Let S and T be two set expressions. The two set judgments 
that we wish to capture are set inclusion, written as S C T, and 
equality, written as 5 = T. We shall use the syntactic variable 
p to range over these two judgments, which are formally binary 
relations of type o ^ o ^ o. A set statement is a formula of the 
form 

VxfSi pi Tl & • • ■ & pn Tn So po To] 
where the quantified variables x are either first-order or of type o 
and formulas To, So, . . . ,Tn, Sn are possibly open set expressions. 

If S and T are closed set expressions, then we write |=s S C T 
whenever the set (of closed first-order terms) denoted by S is 
contained in the set denoted by T, and we write \=s S = T 
whenever the sets denoted by S and T are equal. Similarly, we 
write 

n pn Tn => po To] 

if for all closed substitutions 6 such that \=s Si9 pi TiO for all 
j = 1, . . . , n, it is the case that \=s SoO po To9. 

The following Proposition is central to our use of linear logic to 
establish set statements for Horn clause programs. 

Proposition 2. Let So, To, ... , Sn,T„ (n > 0) be set expres- 
sions all of whose free variables are in the list of variables x. For 
each judgment s p twe write s p t to denote 1 s ~ol t if p is <Z and 
{?s^?t)&i{?t~o?s)ifp is =. // 

\/x[Si p^ Ti & . . . & S,! p,j r„ So Po To] 

is provable in linear logic, then 

\=s Vs[5'i pi Ti & • ■ ■ & p„ r„ => 5o po To] 

Lists can be approximated by sets by using the following sub- 
stitution: 

nil f-i- cons i— > \x\y. item x (By- 

Under such a mapping, the list (cons 1 (cons 2 (cons 2 nil))) is 
mapped to the set expression item 1 ffi item 2 ffi item 2 © 0. This 
expression is equivalent (o-o) to the set expression item 1 © item 2. 

For a simple example of using set approximates, consider mod- 
ifying the sorting program provided before so that duplicates are 
not kept in the sorted list. Do this modification by replacing the 
previous definition for splitting a list with the clauses in Figure [5] 
That figure contains a new definition of splitting that contains three 
clauses for deciding whether or not the "pivot" for the splitting X is 
equal to, less than (using the It predicate), or greater than the first 
member of the list being split. Using the following substitutions for 
predicates 

append \x\y\z. 7{x ffi y) o-o ? z 
split XuXxXyXz. ? x —o l{item u(B y ® z) 
sort ^ XxXy. ? X o-o ? y 

(as well as the trivial substitution for It and ge), we can show that 
sort relates two lists only if those lists are approximated by the same 
set. 



*~ ^1 © ■ ■ ■ ffi An 

r;Ai.-C ... V;An'^C 
F; yli © ■ ■ ■ ffi A„ C 

F; Bi © • ■ ■ ffi C 

tTa^ 

Here, n, m > and in the BC (backchaining) inference rule, the 
formula ?(yli ©■ • • © A„) -o ?(Bi ffi • ■ -QBm) must be a member 
ofPand^ € {Ai,...,A„}. 

Figure 5. Specialized proof rules for proving set statements. 

In the case of determining the validity of a set statement, the 
use of linear logic here appears to be rather weak when compared 
to the large body of results for solving set-based constraint systems 

Ella. 

9. Automation of deduction 

We describe how automation of proof for the linear logic transla- 
tions of set and multiset statements given in Propositions [T] and [2] 
can be performed. 

In order to understand how to automatically prove the required 
formulas, we first provide a normal form theorem for the fragment 
of linear logic for which we are interested. The key result of linear 
logic surrounding the search for cut-free proofs is given by the 
completeness of focused proofs |3|. Focused proofs are a normal 
form that significantly generalizes standard completeness results in 
logic programming, including the completeness of SLD-resolution 
and uniform proofs as well as various forms of bottom-up and top- 
down reasoning. 

We first analyze the nature of proof search for the linear logic 
translation of set statements. Note that when considering provabil- 
ity of set statements, there is no loss of generality if the only set 
judgment it contains is the subset judgment since set equality can 
be expressed as two inclusions. We now prove that the proof system 
in Figure|5]is sound and complete for proving set statements. 

Proposition 3. Let So, To,..., Sn, Tn (n > 0) be set expres- 
sions all of whose free variables are in the list of variables x. The 
formula 

Vx[(?Si ^?ri) &...&(? S„^?TO ^(? So ^?To)] 
is provable in linear logic if and only if the sequent 

(?5i^?Ti),...,(?S„^?r„);5o -"To 
is provable using the proof system in Figure\5\ 

Proof The soundness part of this proposition ("if") is easy to 
show. For completeness ("only if), we use the completeness of 
focused proofs in |3|. In order to use this result of focused proofs, 
we need to give a polarity to all atomic formulas. We do this by 
assigning all atomic formulas (those of the form item (■) and those 
symbols in x of type o) negative polarity. Second, we need to 
translation the two sided sequent r;^ »- T \.oV'^;T -i^ S'^ when 
S is not atomic (that is, its top-level logical connective is ffi) and 
to , T; S'^ ft ■ when S is a atom. Completeness then follows 
directly from the structure of focused proofs. I 
Notice that the resulting proofs are essentially bottom-up: one 
reasons from formulas on the left of the sequent arrow to formulas 
on the right. 

We can now conclude that it is decidable to determine whether 
or not the linear logic translation of a set statement is provable. 
Notice that in a proof built using the inference rules in Figure[5] if 



T-Al^---^An^Al,...,An 

r;S Ti ^Ta, A 

r;S A,,...,An,A 

r;S Bi,...,Bm,A 
Here, n, m > and in the BC (backchaining) inference rule, it 
must be the case that the formula 

{Al^---^An)^{Bl^--- ^Bm) 

is a member of F. 

Figure 6. Specialized proof rules for proving multiset statements. 

the endsequent is F; 5 »- T then all sequents in the proof have the 
form T; S' T, for some 5*'. Thus, the search for a proof either 
succeeds (proof search ends by placing ® R on top), or fails to find 
a proof, or it cycles, a case we can always detect since there is only 
a finite number of atomic formulas that can be 5*'. 

The proof system in Figure [6] can be used to characterize the 
structure of proofs of the linear logic encoding of multiset state- 
ments. Let 

yx[Si Ti & . . . & S'^ p„ r„ 5o Po To] 

be the translation of a multiset statement into linear logic. Provabil- 
ity of this formula can be reduced to attempting to prove So pg To 
from assumptions of the form 

{Ai^--^A„)^{Bi^-- - ^Bm), 

where A\, . . . , An, Bi, . . . , Bm are atomic formulas. Such formu- 
las can be called multiset rewriting clauses since backchaining on 
such clauses amounts to rewriting the right-hand-side multiset of a 
sequent (see rule BC in Figure |6](. Such rewriting clauses are par- 
ticularly simple since they do not involve quantification. 

Proposition 4. Let So and To be multiset expressions all of 
whose free variables are in the list of variables x and let T be a 
set of multiset rewriting rules. The formula So — o To is a linear 
logic consequence ofT if and only if the sequent F; So To is 
provable using the inference rules in Figure^ 

Proof The soundness part of this proposition ("if") is easy to 
show. Completeness ("only if") is proved elsewhere, for example, 
in 1 18, Proposition 2]. It is also an easy consequence of the the 
completeness of focused proofs in 1 3 1 : fix the polarity to all atomic 
formulas to be positive. I 
Notice that the proofs using the rules in Figure|6]are straight line 
proofs (no branching) and that they are top-down (or goal-directed). 
Given these observation, it follows that determining if So -o To is 
provable from a set of multiset rewriting clauses is decidable, since 
this problem is contained within the reachability problem of Petri 
Nets | 9|. Proving a multiset inclusion judgment 3q[So q —o To) 
involves first instantiating this higher-order quantifier. In principle, 
this instantiation can be delayed until attempting to apply the sole 
instance of the ^ L rule (Figure[6l(. 

10. List approximations 

We now consider using lists as approximations. Since lists have 
more structure than sets and multisets, it is more involved to encode 
and reason with them. We only illustrate their use and do not follow 
a full formal treatment for them. 

Since the order of elements in a list is important, the encoding 
of lists into linear logic must involve a connective that is not 



commutative. (Notice that both ^ and ® are commutative.) Linear 
implication provides a good candidate for encoding the order used 
in lists. For example, consider proof search with the formula 

item a o— (p o- (^item b o— {jp o— -L))) 

on the right. (This formula is equivalent to item a ^ {p^ ®{item b ^ 
p^)).) Such a formula can be seen as describing a process that is 
willing to output the item a then go into input mode waiting for 
the atomic formula p to appear. If that formula appears, then item 
b is output and again it goes into input waiting mode looking for 
p. If another occurrence of p appears, this process becomes the 
inactive process. Clearly, a is output prior to when b is output: this 
ordering is faithfully captured by proof search in linear logic. Such 
an encoding of asynchronous process calculi into linear logic has 
been explored in a number of papers: see, for example, | T6u21i . 

The example above suggests that lists and list equality can be 
captured directly in linear logic using the following encoding: 

nil XI. 1. cons i— > XxXRXl. item a; o— o— (_R /)) 

The encoding of the list, say (cons a (cons 6 nil)), is given by 
the A-abstraction 

Xl.item a o— {l o— (item b o— (l o~ -L))). 

The following proposition can be proved by induction on the 
length of the list t. 

Propositions. Let s and t be two lists (built using nil and 
cons) and let S and T be the translation of those lists into expres- 
sions of type o ^ o via the substitution above. Then \/l.(Sl) o-o 
(Tl) is provable in linear logic if and only if s and t are the same 
list. 

This presentation of lists can be "degraded" to multisets simply 
by applying the translation of a list to the formula _L. For example, 
applying the translation of (cons a (cons b nil)) to _L yields the 
formulas 

item a o— (A_ o- (item 6 o- (± o— _L))) 

which is linear logically equivalent to item a ^ item b. 

Given this presentation of lists, there appears to be no simple 
combinator for, say, list concatenation and, as a result, there is no 
direct way to express the judgments of prefix, suffix, sublist, etc. 
Thus, beyond equality of lists (by virtual of Proposition |5) there 
are few natural judgments that can be stated for list. More can be 
done, however, by considering difference lists. 

11. Difference list approximations 

Since our framework includes A-abstractions, it is natural to repre- 
sent difference lists as a particular kind of list abstraction over a list. 
For example, in AProlog a difference list is naturally represented as 
a A-term of the form 

AL.cons Xi (cons X2 (■ . ■ (cons x„ L) . . .)). 

Such abstracted lists are appealing since the simple operation of 
composition encodes the concatenation of two lists. Given concate- 
nation, it is then easy to encode the judgments of prefix and suffix. 
To see other example of computing on difference lists described in 
fashion, see {4). 

Lists can be encoded using the difference list notion with the 
following mapping into linear logic formulas. 

nil XLXl. L I 
cons XxXRXLXl. item x o— (I (R L I)) 

The encoding of the list, say (cons a (cons b nil)), is given by 
the A-abstraction 

XLXl.item a o~ (I a- (item b o- (I o~ L I))). 



(traverse emp null) 
VN.VR.VS. (traverse R S) (traverse (bt N emp R) (cons N S)) 
VN.VM.VR.VS.VL1.VL2. (traverse (bt M LI (bt N L2 R)) S) ^ (traverse (bt N (bt M LI L2) R) S) 



Figure 7. Traversing a binary tree to produce a list. 

VW.Vw.Wk; o-o Ww 

yN.yR.yS.yW.Ww.item N (w RW w) o-o {item N {w o~ S W w)) o~ yW.\fw.R W wo~o SW w 

VAf.Vi\f.VLi .VL2 .VTJ.VS.VH^.Vw. 
Li{Xk.itemM {k L2{\l.itemN o~ {I o~ RW l))k))w o~o S W w o~ 
\/W.Ww.Li{Xk.itemM o- (fc L2{\l.itemN {I RW l))k))w o^ S W w 



Figure 8. Linear logic formulas arising from a difference list approximation. 



In Figure [7] a predicate for traversing a binary tree is given. 
Binary trees are encoded using the type btree and are constructed 
using the constructors emp, for the empty tree, and bt of type 
int btree btree btree, for building non-empty 
trees. A useful invariant of this program is that the list of items 
approximating the binary tree structure in the first argument of 
traverse is equal to the list of items in the second argument. 
Linear logic formulas for computing that approximation can be 
generated using the following approximating substitution. 

btree o 

emp ALA/. L I 
bt ^ \xXRXSXLXl.{R {Xl.item x o- {I o- {S L I))) I)) 

The result of applying that substitution (as well as the one above for 
nil and cons) is displayed in Figure[8l While these formulas ap- 
pear rather complex, they are all, rather simple theorems of higher- 
order linear logic: these theorems are essentially trivial since the 
A-conversions used to build the formulas from the data structures 
has done all the essential work in organizing the items into a list. 
Establishing these formulas proves that the order and multiplicity 
of elements in the binary tree and in the list in a provable traverse 
computation are the same. 

12. Future work 

Various extensions of the basic scheme described here are natural to 
consider. In particular, it should be easy to consider approximating 
data structures that contain items of differing types: each of these 
types could be mapped into different iteji3ci(-) predicates, one for 
each type a. 

It should also be simple to construct approximating mappings 
given the polymorphic typing of a given constructor's type. For 
example, if we are given the following declaration for binary tree 
(written here in AProlog syntax), 

kind btree type -> type, 
type emp btree A . 

type bt A -> btree A -> btree A -> btree A. 

it should be possible to automatically construct the mapping 

btree Aa;.o 
emp 1. 
bt XxXyXz.itemA{x) ^ x ^ y 

that can, for example, approximate a binary tree with the multiset 
of the labels for internal nodes. 

Abstract interpretation |8| can associate to a program an ap- 
proximation to its semantics. Such approximations can help to de- 
termine various kinds of properties of programs. It will be inter- 
esting to see how well the particular notions of collection analysis 
can be related to abstract interpretation. More challenging would 



be to see to what extent the general methodology described here - 
the substitution into proofs (computation traces) and use of linear 
logic - can be related to the very general methodology of abstract 
interpretation. 
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